Sense Tagging the Penn Treebank
نویسندگان
چکیده
This paper describes the methodology that is being used to augment the Penn Treebank annotation with sense tags and other types of semantic information. Inspired by the results of SENSEVAL, and the high inter-annotator agreement that was achieved there, similar methods were used for a pilot study of 5000 words of running text from the Penn Treebank. Using the same techniques of allowing the annotators to discuss difficult tagging cases and to revise WordNet entries if necessary, comparable inter-annotator rates have been achieved. The criteria for determining appropriate revisions and ensuring clear sense distinctions are described. We are also using hand correction of automatic predicate argument structure information to provide additional thematic role labeling.
منابع مشابه
Semantic Tagging for the Penn Treebank
This paper describes the methodology that is being used to augment the Penn Treebank annotation with sense tags and other types of semantic information. Inspired by the results of SENSEVAL, and the high inter-annotator agreement that was achieved there, similar methods were used for a pilot study of 5000 words of running text from the Penn Treebank. Using the same techniques of allowing the ann...
متن کاملEvaluating Automatic Semantic Taggers
Unlike the problems of part-of-speech tagging and parsing, where commonly utilized training and test sets such as the Brown Corpus and Penn Treebank have existed for a number of years, evaluation of word sense disambiguation sytems is not yet standardized. In fact, most previous work in sense disambiguation has tended to use different sets of polysemous words, different sense inventories, diffe...
متن کاملParsing the Penn Chinese Treebank with Semantic Knowledge
We build a class-based selection preference sub-model to incorporate external semantic knowledge from two Chinese electronic semantic dictionaries. This sub-model is combined with modifier-head generation sub-model. After being optimized on the held out data by the EM algorithm, our improved parser achieves 79.4% (F1 measure), as well as a 4.4% relative decrease in error rate on the Penn Chines...
متن کاملThe Penn Treebank: an Overview
The Penn Treebank, in its eight years of operation (1989-1996), produced approximately 7 million words of part-of-speech tagged text, 3 million words of skeletally parsed text, over 2 million words of text parsed for predicateargument structure, and 1.6 million words of transcribed spoken text annotated for speech disfluencies. This paper describes the design of the three annotation schemes use...
متن کامل